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Abstract 


A  definition  of  essential  independence  is  proposed  for  sequences  of  polytomous  items. 
For  items  satisfying  the  reasonable  assumption  that  the  expected  amount  of  credit  awarded 
increases  with  examinee  ability,  we  develop  a  theory  of  essential  unidimensionality  which 
closely  parallels  that  of  Stout.  Essentially  unidimensional  item  sequences  can  be  shown 
to  have  a  unique  (up  to  change-of-scale)  dominant  underlying  trait,  which  can  be  consis¬ 
tently  estimated  by  a  monotone  transformation  of  the  sum  of  the  item  scores.  In  more 
general  polytomous-response  latent  trait  models  (with  or  without  ordered  responses),  an 
M-estimator  based  upon  maximum  likelihood  may  be  shown  to  be  consistent  for  6  under 
essentially  unidimensional  violations  of  local  independence  and  a  variety  of  monotonic¬ 
ity  /identifiability  conditions.  A  rigorous  proof  of  this  fact  is  given,  and  the  standard  error 
of  the  estimator  is  explored.  These  results  suggest  that  ability  estimation  methods  that 
rely  on  the  summation  form  of  the  log-likelihood  under  local  independence  should  generally 
be  robust  under  essential  independence,  but  standard  errors  may  vary  greatly  from  what 
is  usually  expected,  depending  on  the  degree  of  departure  from  local  independence.  An 
index  of  departure  from  local  independence  is  also  proposed. 
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1.  Introduction 


In  the  usual  binary  or  dichotomous  response  formulation  of  item  response  theory 
(IRT),  the  correctness  of  the  j1h  item  in  a  test  or  item  sequence  is  indicated  by  a  (random) 
response  variable  X,  taking  on  the  value  1  for  correct  responses  and  the  value  0  for  incor¬ 
rect  responses.  This  codes  the  examinee’s  response  with  the  score  we  wish  to  assign  to  that 
response.  In  considering  polytomous  data,  it  is  convenient  to  treat  the  coding  and  scoring 
operations  separately.  For  the  jih  polytomous  item  we  will  code  n  possible  response  cate¬ 
gories  with  the  arbitrary  labels  xj0 , xjj , . . .  ,  x;(n_j),  and  indicate  the  examinee’s  response 
with  the  (random)  response  variable 

X i  6  {.xjOixjl  i  •  •  •  1)}- 

For  convenience  in  scoring  the  item,  it  is  also  useful  to  have  a  set  of  binary  response 
variables 

v.  _  /  1  if  Xi  ~  Xj m  * 

1 0  else. 

Note  that  for  each  jf,Y) o  +  Yji  +  . . .  +  V)(n-i)  =  1»  and  that  any  item  scoring  method  A} 
that  assigns  the  numerical  score  a;m  to  the  category  Xjm  may  be  expressed  in  terms  of  the 
Y's  as 

n  — 1 

=  ^  "  Q’jmYjm- 
msO 

Finally,  let  Xj  =  (Xi ,  A% , . . .  ,Xj)  be  the  vector  of  item  responses  on  a  test  of  length  J 
given  by  a  randomly-chosen  examinee,  and  let  xj  =  (x]  ,x2, . . .  .xj)  denote  any  particular 
instance  of  Xj. 

The  general  form  of  an  IRT  model  for  Xj  may  then  be  expressed  as 

P[Xj  =  xj)  =  J  P  [  Xj  =  X,  I  0  =  e  ]  f(8)dd.  (1) 
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We  follow  Thissen  and  Steinberg  (1986)  in  considering  ©  =  (©],...  ,©,*),  the  latent  trait 
or  trait  vector,  to  be  a  random  variable  (vector);  thus,  f(0)  is  this  variable’s  probability 
density  function  for  the  population  in  question.  The  traditional  IRT  assumption  of  local 

independence  reads,  for  polytomous  item  response  models, 

J  n— 1 

p[Xj  =  xj  \e  =  6}  =  U  n  JW2T-.  (LI) 

j— 1  m=0 

where  the  yjm  are  observed  values  of  Yjm  corresponding  to  each  x, ,  and  Pjm  (0)  =  P\Xj  — 
Xjfn  |0  =  0]  are  the  response  characteristic  functions  or,  when  d  =  1,  response  characteristic 
curves  (RCC’s).  There  is  no  natural  monotonicity  assumption  for  general  polytomous 
models,  although  for  those  cases  in  which  the  responses  are  ordered  from  least  correct  to 
most  correct  as  m  increases,  it  seems  reasonable  to  require  that 

n  — 1 

P/ra(£)  =  5Z  ^V*(£)  *s  nondecreasing  in  0  for  all  j,m,  (M) 

k=m 

that  is,  nondecreasing  in  each  coordinate  of  0  with  the  other  coordinates  held  fixed  (these 
cumulative  response  functions  are  considered  by,  for  example,  Samejima,  1972).  Note 
that  Pjm(0)  =  P[response  m  or  greater  |0]  is  the  binary  item  response  function  one  would 
obtain  by  dichotomizing  the  item  so  that  response  m  or  greater  is  scored  as  1  (correct)  and 
any  lower  response  is  scored  as  0  (incorrect).  When  LI  and  M  both  hold  for  a  d-dimensional 
trait  0,  we  will  write  di  for  d.  We  will  be  concerned  mostly  with  di  =  1  models  in  what 
follows. 

This  paper  has  two  aims.  First,  we  wish  to  present  and  explore  a  definition  of  essential 
independence  (El)  for  polytomous  item  response  sequences.  El,  proposed  for  binary  item 
sequences  by  Stout  (1987;  1990),  is  a  weakening  of  LI  that  is  useful  when — as  seems  often 
to  be  the  case  in  real-life  tests — there  is  a  dominant  underlying  latent  trait  for  the  items  but 
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the  presence  of  various  minor  traits  prevents  LI  from  holding  exactly.  For  items  satisfying 
a  condition  like  M  above,  the  theory  of  essential  unidimensionality  and  estimation  of  the 
dominant  unidimensional  latent  trait  based  on  raw  test  score  proceeds  much  as  in  Stout 
(1990).  This  is  the  subject  of  Sections  2  and  3. 

Our  second  aim  is  to  explore  maximum  likelihood  estimates  calculated  under  the 
assumption  that  LI  holds  when  in  fact  only  El  holds.  Section  4  contains  the  basic  result: 
the  MLE  calculated  under  LI  remains  consistent  for  0  under  El,  subject  only  to  regularity 
conditions  and  a  natural  identifi ability  condition.  Thus,  maximum  likelihood  estimation 
is  robust  against  this  realistic  violation  of  local  independence. 

Monotone  unidimensional  local  independence  models  will,  and  should,  continue  to  be 
used  as  basic  psychometric  tools  since  they  are  attractive  to  the  intuition  and  lead  to 
explicit,  analytically  straightforward  likelihoods.  However,  it  is  widely  accepted  that  they 
oversimplify  the  latent  structure  of  most  tests  in  the  real  world.  In  some  situations,  the 
way  the  latent  structure  violates  this  simple  model  may  be  estimated  and  exploited,  but 
in  many  situations  it  may  be  impossible  or  overly  expensive  to  collect  the  data  needed  to 
ferret  out  a  multidimensional  latent  structure.  The  discussion  of  this  issue  by  Drasgow 
and  Parsons  (1983)  is  especially  relevant  here.  Essential  independence  is  a  way  of  char¬ 
acterizing  unidimensional  stability  without  knowing  the  true  likelihood  function  (latent 
structure).  The  importance  of  the  robustness  result  of  Section  4  is  that  it  suggests  that 
ability  estimation  methods  based  on  the  simple  LI  model  continue  to  work  in  situations 
in  which  the  latent  factors  causing  strict  LI  to  be  violated  are  sufficiently  minor  that  El 
holds. 

Despite  this  robustness  in  consistency,  there  is  little  robustness  in  variability.  In 
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Section  5  we  consider  the  standard  error  of  the  estimator  of  Section  4,  showing  that  if 
the  departure  from  local  independence  is  great  enough,  the  estimator  can  fail  to  have  the 
usual  standard  error  based  on  the  information  function,  can  fail  to  converge  at  the  usual 
J~ J/2  rate,  and  can  even  fail  to  be  asymptotically  normally  distributed.  An  index  of  the 
degree  of  departure  from  LI  is  proposed  in  Section  5  that  can  be  used  to  calculate  the 
new  standard  error.  Li-based  estimators  like  the  MLE  can  be  expected  to  be  close  to 
the  examinee’s  6  under  realistic  conditions  if  the  test  is  long,  but  conventional  methods 
of  assessing  the  standard  errors  of  the  estimates  may  be  misleadingly  optimistic  in  these 
same  realistic  settings. 

Gibbons,  Bock,  and  Hedeker  (1989)  have  developed  a  method  of  factor  analyzing 
dichotomous  data  with  correlated  specific  factors  that  may  be  useful  to  obtain  correct 
standard  error  estimates  in  at  least  some  IRT  settings.  An  indication  of  how  their  method 
might  be  used  in  the  present  context  will  be  given  in  Section  5.  Wainer  and  Wright  (1980) 
have  also  reported  some  success  using  jackknife  standard  error  estimates  to  account  for 
extra  variation  in  a  dL  =  1  Rasch  model  due  to  guessing  and  “sleeping”  behavior. 

Also  important  in  assessing  the  standard  errors  of  ability  estimators  is  the  uncertainty 
involved  in  estimating  RCC’s.  Tsutakawa  and  Soltys  (1988)  have  incorporated  RCC  un¬ 
certainty  into  posterior  mean  estimator  standard  errors  under  LI  in  the  dichotomous  case. 
Adapting  such  methods  to  the  El  setting  will  be  of  great  importance  in  eventually  under¬ 
standing  the  true  error  structure  of  estimated  IRT  models,  but  that  is  beyond  our  present 
scope. 

Although  the  results  of  this  paper  are  stated  and  proved  in  the  polytomous  case,  it 
is  expected  that  they  will  find  greatest  application  in  the  dichotomous  setting,  where  IRT 
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techniques  have  been  most  fully  developed.  For  the  reader’s  convenience,  the  main  points 
of  Sections  4  and  5  are  restated  for  dichotomous  responses  in  Sections  6 — these  results 
are  also  new  in  the  dichotomous  case.  Finally,  Section  7  summarizes  the  conclusions  of 
our  work,  and  indicates  extensions  to  other  popular  Li-based  trait  estimators,  such  as  the 
posterior  mode  and  posterior  mean. 

2.  Essential  Independence  and  Item  Sequences 

The  notions  of  essential  independence  and  essential  unidimensionality  were  introduced 
in  Stout  (1987)  and  explored  in  the  dichotomous  case  by  Stout  (1990)  and  Junker  (1988).  In 
the  factor  analytic  tradition,  but  with  a  decidedly  non-factor- analytic  perspective,  Stout 
seeks  a  criterion  by  which  only  dominant  dimensions  can  be  counted.  When  only  one 
dominant  dimension  is  counted,  the  test  is  said  to  be  essentially  unidimensional. 

The  fundamental  idea  behind  essential  independence  is  that  a  trait  vector  ©  is  dom¬ 
inant  if,  after  conditioning  on  ©,  the  residual  covariances  among  the  items  are  small  on 
average.  This  parallels  the  idea,  in  traditional  IRT,  that  if  the  latent  space  is  “complete", 
then  the  residual  covariances  are  all  zero.  A  partial  answer  to  the  question  of  how  small 
the  residual  covariances  must  be  for  0  to  dominate  has  been  provided  by  Stout’s  (1987) 
statistical  procedure  for  assessing  essential  unidimensionality  in  a  fixed,  finite  set  of  di¬ 
chotomous  items.  If  the  residual  covariances  are  small  but  not  zero,  0  continues  to  have 
many  properties  of  LI  latent  trait  vectors:  it  is  strongly  related  to  the  total  test  score,  it 
is  better  and  better  identified  as  the  test  length  grows,  etc. 

To  examine  properties  of  0  and  of  8  estimators  as  test  length  grows,  it  is  necessary  to 
embed  the  finite  test  ATj,. . . , Xj  in  an  infinite  collection  of  items  X.  For  example,  results 
of  Levine  (1989)  make  it  clear  that  not  even  the  distribution  of  0  is  completely  identifiable 
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from  a  finite-length  test,  let  alone  particular  examinees’  8  vectors.  Such  an  embedding  is 
implicit  even  in  traditional  discussions  of  IRT  trait  estimation  (e.g.,  Birnbaum,  1968,  pp 
455-457;  Lord,  1980,  p.  59). 

The  substantive  interpretation  of  this  embedding  varies  from  application  to  applica¬ 
tion.  In  some  settings  it  may  be  reasonable  to  imagine  that  the  process  used  to  generate 
the  test  X\,.. .  ,Xj — which  may,  for  example,  involve  many  item  writers  and  reviewers 
generating  items  of  the  same  character  and  in  the  same  way — is  simply  continued  to  pro¬ 
duce  more  and  more  items.  Or,  it  may  be  reasonable  to  think  of  Xi , . . .  ,Xj  as  forming  a 
(stratified)  sample  from  a  large  item  pool,  as  when  test  forms  are  constructed  by  hand  ac¬ 
cording  to  a  test  specification  matrix,  or  constructed  “on  the  fly”  in  computerized  adaptive 
testing  (CAT).  Other  interpretations  may  also  be  appropriate. 

All  such  interpretations  may  be  encompassed  in  the  following  framework.  In  practice, 
a  test  form  of  length  J  +  1  is  seldom  obtained  by  simply  finding  a  form  of  length  J  and 
tacking  one  more  item  onto  the  end  of  it.  Instead,  forms  of  differing  lengths — intended  to 
measure  the  same  construct — will  be  constructed  at  different  times  according  to  slightly 
different  design  specifications.  Thus,  in  attempting  to  understand  what  is  meant  by  letting 
the  test  length  J  grow,  we  may  consider  a  sequence  of  tests 

X,  =  (*„), 

X i  —  (Aji  ,  X27 ), 

X3  =  (X31 ,  Xjj  ,  X  3  j ), 

Xj  =  (Xj^Xj^Xji,-- ,Xjj), 
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in  which  the  test  of  length  J  need  not  be  a  subtest  of  the  test  of  length  J  +  1,  for  any  J . 
The  only  requirement  here  is  that  each  test  be  designed  to  measure  the  same  construct. 
LI  and  other  properties  of  the  traditional  IRT  model  extend  in  a  natural  way  to  such  a 
sequence  of  tests  by  requiring  that  they  hold  in  every  test  X  j  in  the  sequence.  We  will 
abstract  the  idea  that  the  tests  “measure  the  same  construct”  by  assuming  that  @  is  the 
same  from  test  to  test,  and  that  when  an  item  appears  in  more  than  one  X  j,  it  has  the 
same  response  curves  each  time  it  appears. 

This  framework  allows  us  to  make  mathematically  rigorous  statements  about  the 
identifiability,  uniqueness,  and  estimation  of  dominant  latent  traits  as  test  length  grows.  It 
is  justifiable  insofar  as  it  helps  crystalize  ideas  about  finite-length  tests  with  both  dominant 
and  minor  dimensions,  or  it  suggests  ways  to  improve  the  analysis  of  real  tests.  The  sense 
in  which  ©  is  the  dominant  influence,  essential  independence,  will  be  carefully  defined  in 
the  next  section.  For  now  we  remark  that  it  is  not  necessary  to  arrange  the  items  within 
Xj  in  any  particular  order  to  acheive  this.  Rather,  essential  independence  requires  that 
the  relative  influence  of  minor  factors  not  included  in  0  be  weaker — through  cancellation 
between  items,  moderation  within  items,  etc. — in  longer  tests  than  in  shorter  ones. 

Formally,  this  framework  leads  to  a  rather  messy  notation,  since  it  adds  a  “test  in¬ 
dex”  J  to  all  quantities  under  discussion:  ajm  becomes  ajjm,  A,  becomes  Ajj,  etc.  For 
simplicity’s  sake,  we  will  retain  the  notation  of  Section  1  in  what  follows,  and  speak  infor¬ 
mally  of  embedding  the  fixed  test  X  j  as  the  first  J  items  in  a  single  infinite  item  sequence 
X  =  {X\ , Xi,  Xi , . . .).  The  reader  should  bear  in  mind  that  the  results  below  also  apply 
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to  the  more  general  framework  described  above. 


3.  Essential  Independence  for  Polytomous  Items 
The  traditional  approach  in  IRT  is  to  say  that  a  latent  trait  (vector)  0  completely 
controls  the  interesting  variation  in  the  item  responses  if  LI  and  M  hold.  In  contrast, 
we  would  like  to  be  able  to  determine  whether  the  latent  vector  ©  is  the  dominant  in¬ 
fluence  underlying  the  item  responses.  Moreover,  ©  should  dominate  regardless  of  how 
the  responses  are  scored.  Thus,  it  is  appropriate  to  consider  an  arbitrary  scoring  scheme 
{ajm}  and  corresponding  item  scores  Aj  subject  only  to  the  constraint  that  there  is  some 
M  <  oo  such  that  |ajm|  <  M  for  all  j,m.  AD  of  the  scoring  schemes  considered  below 
wiU  be  bounded  in  this  manner.  If  0  is  to  be  the  dominant  latent  trait  vector,  we  should 
at  least  require  that  the  variation  of  the  raw  score,  Aj  —  j  -4>,  be  small  when  we 

condition  on  ©,  as  J  — ►  oo. 

Definition  3.1.  The  sequence  of  polytomous  items  X  is  essentially  independent  (El)  with 
respect  to  the  latent  trait(s)  ©  if  and  only  if,  for  every  bounded  scoring  scheme  {ajm}  and 
every  8, 

lin  J  S  S  Cov(.4, ,  Aj  |  0  =  8)  =  0.  (El) 

J— oo  x  Z  /  ~ 

«=1 >=1 

This  definition  of  El  for  polytomous  items,  which  is  equivalent  to  requiring  that 
lim  j—oo  Var(Ay  |  0  =  £)  =  0  for  every  bounded  scoring  scheme,  directly  generalizes  Stout’s 
definition  of  strong  Eliot  binary  items  (Definition  3.5,  Stout,  1990).  Stout’s  various  defini¬ 
tions  of  essential  independence  are  likely  not  equivalent  in  general,  but  they  are  equivalent 
when  the  residual  covariances  are  nonnegative  (as  seems  plausible  in  many  educational 
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testing  contexts;  see  the  discussion  following  Theorem  5.1  below).  Only  the  strong  El 
definition  generalizes  naturally  to  the  polytomous  case,  and  for  this  reason  it  is  preferred 
in  this  paper. 

Clearly,  every  LI  item  sequence  is  El.  Since  the  covariances  above  are  unaffected  by 
shifting  the  coefficients  from  ajm  to  a! irn  —  q,m  4-  cj,  for  any  constants  cj,  we  see  that 
Definition  3.1  is  equivalent  to  ones  in  which  only  positive,  bounded  ajm  are  allowed;  or 
only  bounded  ajm  for  which  at  least  one  response  from  each  item  has  ajm  =  0  are  allowed; 
etc.  Now,  consider  the  expected  item  scores, 

n  —  1 

Aj(8)  =  E\Aj  |  ©=£]=]£  djmPjmid), 

msO 

and  the  expected  raw  test  score,  or  test  characteristic  function, 

Aj(£)  =  -22Aj(0). 

J  l=i 

Theorem  3.1.  The  following  are  equivalent,  for  a  sequence  of  polytomous  items  X: 

(a)  X  is  El  with  respect  to  0; 

(b)  For  each  bounded  scoring  scheme  {a;m}  and  each  8 , 

lim  E[  (Aj  -  Aj{8 ))2  (0  =  8  )  =  0: 

J  —  oo 

(c)  For  each  bounded  scoring  scheme  {a;m}  and  each  8 , 

j  J  n  —  l 

jEE  1  y,m  -  Pjm(i)  ]  -  0 

>  =  1  m=0 

in  probability,  given  ©  =  8,  as  J  — ►  oo. 
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Proof:  The  proof  is  an  easy  extension  of  the  proof  of  Theorem  3.2  of  Stout  (1990).  □ 

Estimating  Aj(8)  is  not  necessarily  useful  unless  0  is  unidimensional.  Just  as  with  bi¬ 
nary  items,  a  particular  value  Aj(6 )  may  be  possible  for  examinees  with  radically  different 
0’s  due  to  compensation  among  the  components  of  8.  Hereafter,  we  will  restrict  ourselves 
to  unidimensional  traits  ©  and  consider  estimating  each  examinee’s  6. 

When  0  is  unidimensional,  some  sort  of  monotonicity  condition  becomes  useful,  so 
that  we  can  estimate  8  with  8j  =  AJ1  ( Aj ),  where  AJ1  (•)  is  the  inverse  of  Aj{8).  (In  the 
usual  binary  setting  AT1  (A./)  =  PJ1(Xj),  for  example.)  In  models  that  award  partial 
credit  for  partially-correct  answers,  it  seems  natural  to  require  that  the  expected  amount 
of  partial  credit  awarded  on  each  item  increases  with  the  level  of  the  latent  trait: 

Aj(6)  is  nondecreasing  in  8  for  each  j.  (M ') 

What  is  the  relationship  between  condition  M  in  Section  1  and  M '  above?  We  will 
call  a  sequence  of  items  X  for  which  the  item  response  categories  {xjm}  are  indexed  so 
that  M  holds  an  ordered-response  item  sequence.  On  the  other  hand,  if  a  scoring  scheme 
{o>m}  satisfies,  for  each  jF, 0  <  a;0  <  a;i  <  ...  <  a,- (n-i),  we  will  call  it  a  ordered  scoring 
scheme.  Then,  with  the  convention  that  =  0, 

n  —  1  n  — 1 

A}(8)  =  Y.  ajkPjk(e)  =  Y  ( «>rn  -  «i(m-i))/,;m (*). 

k=  0  m  =  0 

It  follows  that  condition  M  is  equivalent  to  M  '  holding  for  every  ordered-response  scoring 
scheme.  M  is  a  condition  that  has  been  considered  for  many  parametric  ordered-response 
models.  For  example,  Samejima  (1972)  has  shown  that  M  does  hold  for  her  graded-response 
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model,  as  well  as  for  Bock’s  (1972)  nominal  model  constrained  to  apply  to  ordered-response 


items  (also,  see  Thissen  and  Steinberg,  1986).  A  somewhat  milder  form  of  monotonicity 
called  LAD  is  sufficient  to  build  the  estimator  8j. 

Definition  3.2.  The  ordered  scoring  scheme  {ajm}  is  asymptotically  discriminating  (AD) 
if  and  only  if  there  exists  an  e  >  0  such  that 

“  H(aj(n-i)  ~  a>o)  >  e,  V  J.  (AD) 

J  ;=i 

The  item  sequence  X  is  locally  asymptotically  discriminating  (LAD)  if  and  only  if,  for  each 
AD  ordered  scoring  scheme  {ajm},  to  every  8  there  corresponds  an  interval  Ne  containing 
9  and  an  e#  >  0  such  that 

~  f j(^  >  e#,  V  t  €  Ne,  t  *  8,  V  J.  (LAD) 

This  generalizes  LAD  for  binary  item  sequences  as  presented  in  Definition  3.8  of  Stout 
(1990).  Note  that  LAD  imposes  a  minimum  discrimination  condition  on  the  test  charac¬ 
teristic  curves  at  each  9,  as  J  — *  oo.  Also,  the  items  themselves  need  not  have  ordered 
responses;  only  the  scoring  schemes  {ajm  }  need  be  ordered.  LAD  may  be  viewed  as  nat¬ 
urally  extending  the  interpretation  of  M — that  the  expected  amount  of  credit  awarded 
increase*  with  the  examinee’s  ability — from  a  fixed-length  test  to  an  item  sequence,  with¬ 
out  strictly  requiring  M  to  hold  for  every  item  in  the  sequence. 

Theorem  3.2.  If  the  polytomous  item  sequence  X  satisfies  El  and  LAD  with  respect  to  the 
unidimensional  trait  0,  then  for  each  8  and  each  e  >  0,  if  {a;m}  is  a  bounded  AD  ordered 
scoring  scheme,  then 

lim  P[\  A71(AJ)-5|>e|0  =  0)  =  O. 

J  —  OO 
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Proof:  Virtually  the  same  as  the  proof  of  Theorem  3.6  of  Stout  (1990).  □ 

Theorem  S.S.  If  the  polytomous  item  sequence  X  satisfies  El  and  LAD  with  respect  to  the 
unidimensional  trait  @,  and  satisfies  El  with  respect  to  another  latent  trait  r,  then  there 
exists  a  nondecreasing  function  g(t)  such  that 

P[©  =  p(r)]=l. 

Proof:  Follows  Theorem  3.3  of  Stout  (1990)  or  Theorem  2.4  of  Junker  (1988).  □ 

Theorems  3.1  through  3.3  show  that  if  El  and  LAD  hold,  we  can  estimate  a  unique 
dominant  latent  trait  with  any  reasonable  Afl(Aj):  any  other  dominant  trait  we  might 
find  will  be  change-of-scale  of  the  trait  we  have  estimated  with  A  J1  (Aj).  (This  is  the  same 
level  of  trait  uniqueness  as  exists  under  the  general  dc  =  1  model,  although  particular 
parametic  models — for  example,  the  Rasch  model — may  possess  additional  scale  proper¬ 
ties.)  Since  under  El  and  LAD  we  can  identify  and  estimate  a  unique  unidimensional 
dominant  trait  in  the  item  response  data,  we  will  call  this  situation  essentially  unidimen¬ 
sional  ds  =  1.  When  no  single  dominant  trait  exists  in  this  sense,  we  will  write  djr  >  1. 


4.  Maximum  Likelihood  Ability  Estimation 
Often  it  is  desired  to  estimate  individuals’  6  values,  treated  as  parameters  in  the 
conditional  model. 


J  n  — 1 

p\xj = xj  i  © = *]  =  n  n  Pjmw*” , 

;  =  1  m=0 

where  yjm  =  1  when  x;  =  xjm,  and  0  otherwise  (i.e.,  yjm  are  the  observed  values  of 
Y}m).  If  the  polytomous  item  sequence  X  does  not  satisfy  LAD,  the  estimators  described 
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in  Section  3  may  not  exist,  let  alone  be  consistent  for  8.  Even  when  LAD  holds,  it  may  be 
desirable  to  have  a  more-efficient  estimator  than  AJ1  (Aj). 

One  common  method  of  estimating  individual  examinees’  abilities  is  via  maximum 
likelihood,  treating  each  examinee’s  8  as  an  unknown  parameter  to  be  estimated  and  the 
RCC’s  as  known.  When  LI  holds,  the  maximum  likelihood  estimator  (MLE)  8j  is  known 
to  be  a  consistent  estimator  for  6  as  J  — ►  oo,  and  has  good  asymptotic  distribution  prop¬ 
erties  (asymptotic  normality,  efficiency,  etc.),  assuming  that  the  RCC’s  are  known  (e.g., 
Lehmann,  1983).  We  wish  to  investigate  the  behavior  of  8j  computed  under  the  (false) 
assumption  of  LI  with  respect  to  0  when  the  item  sequence  satisfies  El.  Technically,  8 j  is 
called  an  M-estimator  since  it  is  no  longer  based  on  the  true  likelihood,  which  is  unknown 
under  El  (e.g.,  Serfling,  1980,  pp.  243  ff.).  However,  for  convenience  we  will  continue  to 
call  8j  the  MLE,  since  it  is  based  on  maximizing  a  (wrong)  likelihood. 

There  are  two  reasons  for  working  with  the  MLE.  First,  it  is  commonly  used  for  exam¬ 
inee  scoring  in  applied  IRT  work,  so  we  are  compelled  to  know  its  behavior  under  realistic 
violations  of  LI.  Second,  the  behavior  of  the  MLE  may  be  taken  to  be  representative  of 
the  behavior  of  other  likelihood-based  methods.  Our  work  with  the  MLE  is  intended  to 
suggest  that  similar  robustness  to  departures  from  LI  within  an  El  framework  could  be 
expected  of  other  popular  estimators  and  predictors,  such  as  estimators  of  the  posterior 
mode  and  posterior  mean  (e.g.,  Samejima,  1969;  Bock  &  Mislevy,  1982;  Lord,  1986).  This 
point  will  be  taken  up  again  in  the  discussion  in  Section  7. 

Let  us  now  turn  to  the  requirements  for  consistency  of  8j,  the  convergence  of  6j  to  8 
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as  J  grows.  Assuming  (incorrectly)  that  LI  holds  with  respect  to  ©,  the  log-likelihood  for 
estimating  one  examinee’s  6  based  on  his  or  her  item  responses  Xj — or  equivalently,  the 
response-category  indicators  Y)m — is 


J  n  — 1 


J  n— 1 


=  log  n  n  Pi~v)Yim =ee 

jsi  m=0  ji=l  m=0 

where  A jm(0)  =  log  Pym(0).  Thus,  07  must  satisfy  the  likelihood  equation 

J  n  — 1 


=  =  o. 


(2) 


7=1  m=0 

Under  LI,  the  fact  that  j£'j(8)  ss  0  as  J  — *  oo  allows  us  to  locate  a  root  8j  of  (2)  near  the 
examinee’s  8.  Under  El,  Theorem  3.1(c)  ensures  that 

J  n  —  1 


-  7  5Z  Pj|m(0))  *  o. 


(3) 


7=1  m=0 

in  probability,  given  0  =  5,  as  long  as  the  scoring  scheme  ajm  =  A^-m(0)  is  bounded 
uniformly  in  j  and  m  for  each  8.  Hence,  we  can  expect  to  find  a  root  8j  of  (2)  near  8  under 
El  as  well.  The  dependence  of  a7m  on  8  here  is  irrelevant,  since  we  are  conditioning  on 
0  =  8  fixed. 

To  obtain  the  limit  (3)  and  similar  limits  needed  for  consistency  of  8j,  we  assume  that 
for  all  8,  there  exists  an  interval  Bg  containing  8  and  a  constant  Mg  <  oo,  such  that 


l^m(*)l  <  Me  Vt€  Bg,  V  j,  m.  (4) 

Condition  (4)  is  really  a  fairly  mild  modeling  assumption.  For  example,  in  the  binary 
three  parameter  logistic  model  it  would  be  satisfied  if  all  the  difficulty  and  discrimination 
parameters  were  bounded  in  absolute  value. 
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A  second  important  consideration  in  likelihood-based  (or  indeed  any)  estimation  is 
identifiability  of  the  parameter.  The  criterion  used  for  identifiability  in  Section  3,  LAD, 
is  not  necessarily  appropriate  when  the  response  categories  are  unordered.  Instead,  it  is 
typical  and  reasonable  to  require  that  for  each  6 ,  there  exists  an  e*  >  0  such  that 

J  n— 1 

iA»  =  -j  E  E  Ajm(s)j>; „(*)  >  <„  v  j.  (5) 

7  =  3  msO 

fj(0)  is,  of  course,  the  usual  test  information  function.  If  (5)  holds,  there  is  enough 
identifiability  for  the  MLE  to  work.  The  following  proposition  gives  several  sufficient 
criteria  for  identifiability  in  this  sense. 


Proposition  4.1.  If  any  of  the  following  conditions  hold,  then  (5)  holds. 

(,)  V «.  3  u  >  0 :  i  %77r  a  ¥  * 

(b) V«,3£<>0:j  £/.,  E»“ol  P'jm{0))2  >  O,  V  J; 

(c)  V  0,  3  e,  >  0  :  *  Z^o  I  (*)  I  >  V  J; 

(d) V«,3c>0:j  Z/„  Z""’,  Pfm  (*)  >  e#,  V  J. 

Proof:  Condition  (a)  is  exactly  (5);  condition  (b)  suffices  by  (a)  and  the  fact  that 


1  /Pjm(0)  >  1  always  holds:  condition  (c)  suffices  by  (b)  and  the  fact  that 


7EEi^wi’a±{jEEiO*>i}’. 


J  B  —  1 


7  =  3  m=0 


7=3  m=0 


by  Jensen’s  inequality  (Ash,  1972,  p.  287).  Finally,  condition  (d)  suffices  by  noting  that 

and,  using  Jensen’s  inequality  again, 

j  E  ^{E  >  {±  E  E  /j-w}*.  □ 

J  JmJ  P, o(«)  km=J  J  kJ7=1  pio(«)  m=1  J 
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The  most  interesting  of  the  criteria  in  Proposition  4.1  is  (d).  Note  that  by  taking 
ajo  =  0  and  a;m  s  1  for  m  >  0  in  the  definition  of  LAD,  we  see  that  if  LAD  holds  and  the 
RCC’s  are  differentiable,  then  (5)  holds  also. 

Each  of  the  conditions  M,  LAD,  and  (5)  represent  identifiability  or  detection  condi¬ 
tions  for  the  sequence  X  and  latent  trait  0,  and  they  fit  into  a  rather  neat  hierarchy  for 
essentially  independent  smooth  IRT  models.  M  is  the  most  restrictive  identification  con¬ 
dition;  it  imposes  a  highly  interpretable  condition  on  each  item  in  the  test  which  virtually 
guarantees  LAD.  LAD  is  less  restrictive,  in  that  it  imposes  the  interpretation  of  M  at  the 
test  characteristic  curve  level,  not  the  level  of  individual  items.  Moreover,  LAD  implies 
(5).  The  minimum  information  condition  (5)  is  least  interpretable,  but  has  the  advantage 
of  widest  applicability.  Moreover,  as  Theorem  4.1  below  shows,  if  (5)  holds,  then  Oj  con¬ 
verges  to  0 ,  given  0  =  6.  This  hierarchy  is  not  new  or  deep  mathematically,  but  serves  to 
illustrate  the  transition  from  intuitively  appealing  psychological  models  to  adequate  but 
less  pleasing  statistical  ones. 

Theorem  J.l.  Let  A'  be  a  polytomous  item  sequence  satisfying  El,  (4)  and  (5).  Then  there 
exists  a  sequence  { 0j  :  J  >  J9)  of  roots  of  (2)  such  that 

lim  P[|  6j  -  e  j  <  c|©  =  0}  =  1, 

J—oo 

for  every  c  >  0. 

Note  that  the  sequence  9j  may  not  start  at  J  =  1,  and  for  small  J ,  there  may  be  no 
solutions  to  (2).  This  is  not  a  serious  limitation;  see  Theorem  4.2  below.  Also,  when  LAD 
holds,  the  trait  being  estimated  is  the  same  dominant  trait  whose  estimation  was  treated 
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in  Section  3;  this  follows  from  Theorem  3.3.  The  novelty  of  the  following  Cramer-stvle 
proof  is  that  (local)  independence  is  not  assumed. 

Proof:  Let  €0  >  0  be  arbitrary  and  fixed  in  advance.  Without  loss  of  generality,  we  assume 
that  (0  —  (O,0  +  Co)  C  Bt,  where  Be  is  the  interval  given  in  (4).  Our  goal  is  to  obtain 
roots  of  (2)  in  the  interval  (6  —  eo,0  +  e0).  The  second-order  Taylor  polynomial  for  j£'j(t) 
in  {6  —  t$,0  +  c#)  is 

=  J</(*)  +  j(t  -  *)*"(*)  +  jj(t  -  (6) 

j  J  n  —  1 

=  jEL  KmWYjm 

j=  1  m=0 

+(« - «)  j  E  E  E  E  a";k)Vm. 

J=  1  m=0  >=1  m=0 

where  £  =  0  +  r(t  —  0),  for  some  0  <  r  <  1.  We  have  already  shown  in  (3)  that  ±£'j{0)  —  0 
in  probability,  given  O  =  0,  under  El  and  (4).  Similarly 

j  ^  »-> 

jEE  ~  0. 

j—  1  m  =  0 

Since  £[-^'J(0)|0]  =  -lj(0),  this  implies  that  j£'j(0)  +  lj{0)  — *  0  in  probability,  given 
©  =  0-  Hence,  using  (4)  again,  we  may  rewrite  (6)  as 

7<r(0  =  «,U)  “  «  ~ 

where  p./  is  a  random  quantity  satisfying  \  pj  \  <  1,  <?j,(l)  denotes  quantities  tending  to 
0  in  probability,  given  9-0,  and  fj(0)  is  bounded  away  from  0  by  (5).  Thus,  for  large 
J,  7 £'j(t)  is  approximately  linear  near  0,  and  with  large  probability  is  positive  for  some 
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t+  €  (0  —  Co, 6),  negative  for  some  t_  6  (0,0+  «o),  and  by  continuity  equal  to  zero  for  some 
0j  €  (0  —  eo,0  +  e0  )  Hence,  there  is  a  sequence  0j  of  solutions  to  (2)  with  the  property 
that  for  any  cq  >  0, 

P[\0j-0\<  e0 10  =  0}  -  1, 


as  J  — *  oo.  Further  details  may  be  found  in  Serfling  (1980,  pp.  143-148).  □ 

In  general,  we  do  not  expect  the  roots  0j  of  (2)  to  be  unique  (e.g.,  Samejima,  1973). 
Moreover,  among  the  multiple  roots  of  (2),  there  is  likely  to  be  only  one  consistent  root 
sequence:  Foutz  (1977)  proves  that  if  0ltJ  and  07<j  are  both  consistent  root  sequences, 
then  under  LI,  P[0\j  =  0j%j  |  0  =  0]  — •  1  as  J  — ►  oo.  Thus,  the  situation,  even  under  LI, 
is  opposite  that  portrayed  by  Lord  (1980,  p.  59):  rather  than  being  optimistic  that  the 
roots  should  be  eventually  unique,  one  might  be  pessimistic  that  multiple  roots  continue 
to  happen  as  J  — *  oo,  and  only  one  of  these  roots  for  each  J,  brings  us  closer  to  the  true  6. 

This  is  not  a  practical  problem,  however.  We  shall  see  next  that  the  standard  practice 
of  approximating  a  root  of  (2)  by  Newton's  method,  produces  estimates  that  are  consistent 
for  0  under  El,  even  though  the  MLE  and  the  Newton’s  method  estimate  of  it  were  com¬ 
puted  under  the  assumption  LI.  Thus,  familiar  numerical  methods  continue  to  be  useful 
in  estimating  0  under  El. 

Theorem  4.2.  Suppose  the  assumptions  of  Theorem  4.1  hold,  and  let  0j  be  any  sequence 
of  consistent  estimates  for  0,  given  ©  =  0.  Then  the  Newton’s  method  improvement, 


is  also  consistent  for  0. 


0'j=0j- 


e'AQj) 


(7) 
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Proof:  As  in  Lehmann  (1983,  p.  423)  we  may  substitute  a  Taylor  expansion  of 
about  9  into  (7)  to  obtain 

Bj  ~  0  =  — +  (Oj  ~  0)  •  o,(l).  (8) 

The  second  term  on  the  right  clearly  tends  to  zero  as  9j—*9.  For  the  first  term,  a  continuity 
argument  shows  that  — -j £'j(9j)  as  /j(^)  >  0,  and  we  know  from  (3)  that  —  O.D 

Clearly,  the  assertion  of  the  theorem  can  be  iterated  to  show  that  the  result  0j  of, 
say,  twenty  Newton  steps  from  9j  would  also  be  consistent.  Such  an  estimator  should  be 
closer,  in  some  sense,  to  the  consistent  roots  found  in  Theorem  4.1.  Newton's  method 
requires  an  initial  guess  9j\  when  LAD  holds,  9j  =  A~jl(Aj)  is  a  natural  choice,  in  view 
of  Theorem  3.2. 


5.  Standard  Error  of  the  MLE 

In  the  usual  LI  ability  estimation  theory,  we  expect  that  the  sequence  9j  will  be 
asymptotically  normal  and  efficient, 

J1>(9j-9)~AN(0,1/Ij(9)),  (9) 

as  J  — »  oo,  where  Ij{6)  is  the  traditional  test  information  function  introduced  in  (6).  A 
result  like  (9)  identifying  the  standard  error  of  0 j  is  needed  to  do  statistical  inference  using 
8j — or  indeed,  merely  to  know  how  well  to  trust  9j  as  an  estimator  of  9  for  particular  fixed 
J  that  arise  in  applications.  However,  (9)  may  fail  in  the  essentially  unidimensional  case 
in  two  interesting  ways:  it  may  be  that  asymptotic  normality  holds  but  the  asymptotic 
variance  is  no  longer  /j(tf)-1;  or  it  may  be  that  asymptotic  normality  fails  completely. 
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When  asymptotic  normality  does  hold,  we  shall  see  that  the  deviation  from  the  efficient 


variance  is  controlled  by  the  quantity 

2  3 

cm  =  -  EE  Cov(Ai,Aj  |  ©  =  0),  (10) 

<=1 

where  the  item  scores  Aj  are  constructed  from  the  scoring  scheme 


a;m  =  Km(8),  V  j,m. 


(ID 


The  scoring  scheme  (11)  is  a  technical  device  that  w’ill  be  used  throughout  this  section. 
The  reader  should  not  be  mislead  into  thinking  that  (11)  is  a  scoring  scheme  that  could 
be  applied  to  obtain  a  practical  estimator  as  in  Section  2  (to  do  so,  we  would  already  have 
to  know  0!).  Under  El  and  the  bounds  (4),  we  know  that  ^Cj(O)  — *  0,  for  all  8,  but  the 
behavior  of  Cj{9 )  itself  depends  on  the  amount  of  local  dependence  in  the  item  sequence 
X.  Under  LI  of  course,  Cj(8)  =  0. 

To  see  the  effect  of  Cj{6 )  on  (9)  under  El,  we  may  deduce  from  (6)  that 


J’(8j 


vj-U'jje) 

im 


(12) 


in  the  sense  that  the  asymptotic  distributions  of  the  left  and  right  hand  sides  are  the  same. 
If  we  can  identify  the  asymptotic  distribution  of  (a  multiple  of)  J~*£'j{8)  then  by  (12), 
we  will  also  be  able  to  identify  the  asymptotic  distribution  and  rate  of  convergence  of  8j. 
An  indication  of  what  is  possible  is  provided  by  Theorem  5.1  below.  Let  us  abbreviate 


a2j(0)  =  Var (Aj  \  9) 
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(13) 


1  J  2  J  *'-i 

=  77  H  Var (Aj  |  6)  +  —  EE  Cov(A,-,  |0), 

j=i  «=i i=i 

for  whatever  scoring  scheme  {aym}  is  currently  under  consideration.  Under  the  scoring 
scheme  (11),  Aj  -  Aj(0)  &  jt'j(0)  and  **(0)  =  ±[lj{0)  +  Cj{6)\. 

Theorem  5.1.  Suppose  that  the  conditions  of  Theorem  4.1  hold  for  the  item  sequence  X 
and  the  latent  trait  0.  Also,  suppose  that  for  some  fixed  0,  the  scoring  scheme  (11)  yields 


The  assertions  about  the  asymptotic  variances  in  (a),  (b),  and  (c)  follow  from  this  calcu¬ 
lation  by  choosing  R(J)  appropriately.  □ 

Conditions  (a)  and  (b)  of  the  theorem  correspond  to  the  familiar  case  in  which  the  rate 
of  convergence  of  0j  to  8  is  J~3 .  If  Cj(8)  — »  0  as  J  — ♦  oc,  we  get  the  usual  asymptotically 
normal  and  efficient  result  (9)  for  8j.  Otherwise,  we  get  subefficiency  or  superefficiency  de¬ 
pending  on  the  sign  of  Cj(8).  The  use  of  the  terms  efficient,  subefficient,  and  superefficient 
to  describe  the  asymptotic  variance  as  being  equal  to,  greater  than,  or  less  than  7J1  (8)  is 
suggestive  here  but  perhaps  misleading.  In  fact,  JJ1  (8)  is  the  efficient  variance  only  when 
£j(8)  is  the  true  log-likelihood  function.  Under  El,  some  other  (unknown)  log-likelihood 
function  Lj(8)  applies,  and  examining  the  true  efficiency  of  8j  would  require  access  to  the 
(unknown)  function  E[—  Lj(0)  |0). 

Condition  (c)  corresponds  to  the  rate  of  convergence  of  0,>  to  6  being  slower  than  J~ i. 
This  would  happen,  for  example,  if  the  inter-item  covariances  were  generally  positive  and 
sufficiently  large  to  force  Cj{8)  to  be  unbounded.  Formally,  there  is  also  the  possibility 
that  the  convergence  of  8j  to  8  could  be  faster  than  J~ a,  but  this  would  require  that 
Ij(8)  +  Cj(8)  — *  0,  that  is,  Cj{8)  negative  for  all  large  J.  As  we  will  argue  next,  this  seems 
unlikely  in  many  educational  testing  applications.  Hence,  this  possibility  was  omitted  from 
the  theorem  statement. 

For  reasonably  homogeneous  tests,  one  intuitively  expects  that  items  not  independent 
given  0  would  be  positively  correlated.  This  is  certainly  implicit  in  the  factor- analytic 
tradition  of  test  theory,  (e.g.,  Anastasi,  1988,  pp.  377  ff.).  An  example  of  the  invocation  of 
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this  principle  in  IRT  research  is  the  design  of  the  simulation  study  in  Drasgow  and  Parsons 
(1983).  Indeed,  it  is  quite  reasonable  to  assume  that  if  X  is  essentially  unidimensional  with 
respect  to  the  trait  0,  there  are  other  traits  ©2,0j,.  •  •  ,  ©<i  such  that  LI  holds  with  respect 
to  the  d-dimensional  trait  vector  (0,  ©2 ,  ©3 ,  •  •  • ,  ©<*)  (see,  for  example.  Stout,  1989).  If 
these  traits  are  psychologically  meaningful,  it  is  also  reasonable  to  assume  that  they  will 
be  associated  (see  Holland  and  Rosenbaum,  1986,  for  a  definition),  given  ©  =  0.  In  the 
ordered-response  case,  a  result  of  Jogdeo’s  (1978)  can  be  used  to  argue  that  conditional  on 
0  =  0  alone,  the  inter-item  covariances  will  be  nonnegative  (indeed,  any  ordered  scoring 
scheme  for  X ,  given  0,  will  be  associated).  Thus,  Cj(9)  >  0  will  generally  be  expected: 
the  variance  of  J*(0j  —  0)  will  generally  be  higher  than  l//j(0). 

Theorem  4.2  gave  a  practical  way  to  approximate  the  estimator  8j.  The  following 
corollary  extends  Theorem  5.1  to  obtain  asymptotic  normality  for  this  approximation. 
Corollary  5.1.  Suppose  that  the  conditions  of  Theorem  5.1  hold,  and  that  8j  is  any 
estimator  with  R(J)(8j  —0)  bounded  in  probability.  The  Newton’s  method  approximation 
0}  of  Theorem  4.2  based  on  8j  satisfies 

AN 

1  K  J  IAS)3  ' 


Making  appropriate  choices  of  R(J),  we  obtain  the  same  three  cases  as  in  Theorem  5.1. 
Proof:  Using  (11)  and  (13),  we  may  rewrite  (8)  as 


R(J)(6)  -  0)  «  R(J) 


Aj  ~  Aj{6)  +  Cj(9))/jy 

Ij(9) 


+  R(J)(0j  -  0)o„(l) 


The  result  follows  from  (14),  since  R(J)(9j  —  0)  is  bounded  in  probability.  □ 
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By  definition,  R(J)(0j  -  6)  is  bounded  in  probability  if  |  8j  -  0  |  <  pj©  =  0] 

can  be  made  arbitrarily  close  to  1  as  J  — ♦  oo  by  choosing  B  large  enough.  In  Section  4 
we  suggested  using  0j  =  AJ1  (Aj),  for  any  convenient  scoring  scheme  {a;m},  as  an  initial 
guess  for  Newton  s  method  under  LAD.  Routine  calculation  using  Chebyshev’s  inequality 
shows  that  R{J)(0j  —  6)  will  then  be  bounded  in  probability  as  long  as 


R2(J) 

J2 


j  •- 1 


EE  Cov(A,-,  Aj  |  ©  =  6)  is  bounded, 


(15) 


<=i  ;=  i 


as  J  —  oo.  This  represents  a  strengthening  of  El  since  R(J)  is  a  fixed,  increasing  function 
of  J  for  all  0.  The  assumption  that  (15)  holds  for  scoring  scheme  (11)  is  also  implicit  in 
Theorem  5.1.  Hereafter,  we  will  say  that  fast  El  holds  if  (15)  holds  for  a  fixed  rate  R(J ) 
and  every  bounded  scoring  scheme  {aJTO}. 

Theorem  5.1  also  assumes  that  the  raw  score  Aj  is  asymptotically  normal  in  the 
sense  of  (20).  Is  this  realistic?  The  following  Central  Limit  Theorem  (CLT)  for  dependent 
random  variables,  easily  deduced  from  Theorem  2.2  of  Dvoretzky  (1972),  sheds  light  on 
the  qualitative  side  of  this  question. 

Theorem.  5.2.  Suppose,  for  some  fixed  0  and  some  bounded  scoring  scheme  {ajm}: 

(a)  J2<r2(0)  oo; 

(k)  777(7)  —  -4,(0)  |  A;-_j,0]  -♦  0;  and 

^  J3 »*(«)  ^-;'s i  VarfAj  j  A,-_j,0]  — *  1, 
as  J  -*  oo.  Then,  for  this  0  and  {ajm}, 


1 

oj{0) 


[Aj  -  Aj(6)\  ~  .4iV(0,l). 


(16) 
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The  assumptions  of  Theorem  5.2  would  be  difficult  to  verify  in  practice,  but  this  is 
somewhat  offset  by  the  fact  that  they  are  intuitively  meaningful;  thus,  we  can  at  least  ask 
whether  these  assumptions  are  qualitatively  appealing.  Assumption  (a)  is  merely  a  way  of 
ensuring  that  most  items  contribute  significantly  to  Aj.  It  is  difficult  to  imagine  a  useful 
item  sequence  or  scoring  scheme  for  which  this  would  not  be  true.  The  conditioning  in 
(b)  is  not  only  on  a  fixed  value  8  of  0,  but  also  on  a  fixed  value  of  Aj-j ,  for  each  j.  If 
the  conditioning  on  Aj- j  were  dropped,  (b)  would  become  an  exact  equality.  Under  El, 
conditioning  on  0  =  8  stabilizes  Aj- \  with  high  probability  when  j  is  large  (Theorem  3.1), 
so  we  might  expect  that  assumption  (b)  would  hold  for  many  El  item  sequences.  To  gain 
some  intuition  about  assumption  (c),  we  may  rewrite  it  as 

3  £/.,  Var (A,  |  0)  *  ZL  Co |  »)  ^ 

My-, .«)  i  Varu,  \Ai.l,S) 

Hence,  recalling  that  the  Aj  are  bounded,  (c)  implies  the  fast  El  condition, 

2  3  <_1 

—  52  52  Co v(Ai,Aj  |  8)  is  bounded,  as  J  — *  oo.  (18) 

i=i  j=  i 

This  condition  is  almost  ubiquitous  in  general  CLT’s  for  dependent  random  variables  (e.g., 
Bradley,  1985).  Note  that  (18)  precludes  applying  Theorem  5.2  in  the  situation  of  Theorem 
5.1  (c);  moreover,  from  (17)  we  can  see  that  some  additional  balancing  between  the  variance 
and  covariance  terms  is  needed  for  assumption  (c)  to  hold.  Example  5.2  below  shows  that 
Theorem  5.1  (c)  can  nevertheless  occur. 

There  is  another  way  in  which  the  assumptions  (b)  and  (c)  are  not  entirely  innocuous. 
El  and  its  strengthening  (18)  are  second-order  conditions  (i.e.,  conditions  that  restrict  only 
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the  expected  values  (given  8)  of  products  of  two  item  responses  at  a  time).  It  is  well-known 
that  second-order  conditions  alone  are  not  enough  to  guarantee  (16).  An  example  (without 
reference  to  latent  traits)  is  constructed  by  Bradley  (1989,  Section  2)  of  a  dichotomous 
sequence  X  for  which  X ,  and  Xj  are  independent  for  every  pair  :  ^  j  and  yet  the  CLT 
fails.  (The  reader  is  referred  to  Bradley’s  paper  for  the  rather  complicated  construction; 
also,  see  Bradley,  1985.)  In  light  of  the  recent  interest  in  Markov  dependence  among  items 
(e.g.,  Jannarone,  1986;  Spray  &  Ackerman,  1986),  it  is  intriguing  to  observe  that  Bradley’s 
example  arises  as  a  dichotomous  scoring  scheme  for  a  Markov  chain. 

We  conclude  this  section  with  two  simpler  examples  illustrating  the  practical  effects 
that  item  dependence  can  have  on  the  standard  error  of  8j.  The  examples  are  both  vari¬ 
ations  of  the  paragraph  comprehension  example  of  Stout  (1990;  Example  2.3).  Section 
4.2  of  Rosenbaum  (1988)  is  also  relevant.  More  complicated  examples  and/or  examples  in 
other  realistic  settings  might  also  be  constructed. 

Example  5.1.  Suppose  A'j ,  X2 ,  X3 , . . .  are  binary  item  response  variables,  having  the  same 

response  curve  Pj(8)  =  8  (so  the  latent  scale  is  the  interval  (0.1)  and  P\X j  =  1  j  6)  =  0). 

Moreover,  suppose  that  the  items  are  arranged  in  successive  groups  of  g„  items  as 

Xj ,  Xn ,  •  •  ■  1  X9o ; 


t  ? •  •  •  ,  X 2c 


etc., 

such  that  different  groups  of  gQ  items  are  independent  of  one  another,  given  8 ,  and  items 


within  a  single  group  are  positively  correlated,  given  8.  For  simplicity,  we  will  take 


C0TT(Xi,Xj[8) 


{c  if  Xi  and  Xj  are  in  the  same  group, 
0  if  not, 
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for  some  fixed  c  €  (0,1].  This  is  a  naive  model  for  a  paragraph  comprehension  test  in  which 


several  paragraphs  are  presented  and  ga  questions  are  asked  for  each  paragraph.  Here.  8 
represents  a  trait  common  to  all  the  items,  which  we  might  wish  to  think  of  as  reading 
comprehension;  and  the  nonzero  correlations  are  induced  by  nuisance  traits,  for  example, 
specific  knowledge  about  the  subject  matter  of  the  paragraph  at  hand. 

It  is  straightforward  to  verify  that  El  holds  for  this  sequence  of  items;  that  is,  for 
any  bounded  scoring  scheme  {ajm  :  m  =  0,1; j  =  1,2,...}  generating  item  scores  Aj  = 

(1  —  Xj)aj o  +  Xjdji , 

1  J  i-J 

Cov(/i,-,  Aj  |  9)  — ►  0. 

.=  1  ;=  1 

Moreover,  it  can  be  verified  (via  Theorem  5.2  or  by  applying  the  usual  CLT  to  the  para¬ 


graph  scores  Gk  =  A>)  tbat  for  any  bounded  scoring  scheme  {a7m }  for  which 


Jaj(B)  —*  oo. 


\Aj  -  Aj(8))  ~  AA'(0,1), 


given  ©  =  8.  Now,  for  the  scoring  scheme  (11),  the  item  scores  are  .4;  =  (,Y;  —  8)/8(  1  —  8) 
so  that  Cov(A,,  Aj  |  8)  =  c/0(l  —8)  if  .4,  and  Aj  are  in  the  same  group,  and  0  otherwise. 
Letting  kj  be  the  greatest  integer  less  than  or  equal  to  J/go ,  we  see  that 


Cj(8) 


7o\  c  _  c(g o  -  1) 
2/  0(1-0)  *  0(1  -  8)  ’ 


and  is  bounded  but  nonzero  as  J  — •  oo.  Hence,  using  scoring  scheme  (11), 


t-1,A  ^  Ti  <7j(0)  Aj  -  Aj(8) 

J  2  (8  J  —  8)  =5  J  *  — - 

lj(0)  <tj(8) 


and  is  asymptotically  normal  with  mean  0  and  variance 


Ja_ W 
ij(oy 


1  ,  e(g«-Q 

9(1-6)  ^  g(  1-9) 

I - 1 - 12 

1 «(!-«)  ‘ 


=  on  -  m  +  c(9e  - 1)}. 


Note  that  the  deviation  from  the  efficient  variance  0(1  —  0)  is  indeed  due  to  dependence 
among  the  g0  items  related  to  the  same  paragraph,  and  that  Cj(0)  as  c(g„  —  1)/0(1  —  0) 
appropriately  characterizes  this  deviation.  This  illustrates  part  (b)  of  Theorem  5.1.  □ 
The  situation  can  be  understood  intuitively  as  follows:  when  items  in  a  group  are 
positively  correlated  given  0,  a  particular  response  to  one  item  in  the  group  is  likely  to 
be  duplicated  in  responses  to  other  items  in  the  same  group.  Thus,  a  wrong  response  is 
likely  to  bias  the  0-estimate  downward  more  than  is  usual,  and  a  right  response  is  likely 
to  bias  the  estimate  upward,  the  biasing  effect  being  magnified  by  the  size  of  the  group. 
This  inflates  the  effect  of  noise  inherent  in  the  0-estimation  problem. 

Example  5.2.  Now  let  the  sizes  of  the  groups  of  mutually  dependent  items  increase.  We 
take  dichotomous  items  Xj  ,X2, . . .  with  identical  ICC's  f>>(0)  =  0  as  before,  but  now  group 
them  as  follows: 

A],A2)...,^j(l)i 

Xg(  1)+J  •  •  '^s(l)+s(2)i 

etc., 

where  g(k)  is  a  nondecreasing  function  of  k.  For  specificity,  we  will  take  g(k)  =  k*.  Once 
again,  each  group  of  (/(fc)  items  is  independent  of  the  other  groups,  and  for  simplicity,  we 
take  Corr(X,  ,  Xj )  =  c  for  X,  and  Xj  in  the  same  group.  We  can  verify  that  El  holds  for 
this  sequence  of  items,  and  apply  Liapunov’s  Central  Limit  Theorem  (Serfling,  1980,  p. 
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30)  to  the  paragraph  scores 
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In  this  example,  the  groups  of  dependent  items  become  so  large  that  the  magnified 
effects  of  individual  item  responses  have  actually  slowed  the  rate  of  convergence  of  8j  to 
8.  These  magnified  effects  would  be  present  in  any  ^-estimation  method  that  ignored  the 
nature  of  the  inter-item  dependencies.  However,  this  need  not  be  an  argument  against 
using  estimation  methods  that  assume  local  independence  when  this  does  not  hold.  The 
real  lesson  is  that  if  one  wants  to  continue  to  use  a  familiar  estimator  like  8j  even  though 
LI  may  fail,  then  one  must  be  able  to  qualitatively  justify  an  asymptotic  distribution 
assumption  like  (14),  and  to  quantitatively  estimate  Cj(8)  so  that  realistic  standard  errors 
of  estimation  can  be  calculated,  etc.  Note  that  in  Example  5.2,  Cj{8)  is  unbounded  as 
J  — >  oo,  but  El  still  holds;  the  unboundedness  of  Cj(8)  is  responsible  for  the  Slower  rate 
of  convergence  of  8j  to  8.  If  Cj(8)  grows  too  fast  as  J  —*  oo  then  El  itself  can  also  fail. 

The  quantity  Cj(8),  or  perhaps  its  average  value  over  all  8’s,  should  be  viewed  as  an 
index  of  departure  from  local  independence,  locating  collections  of  items — tests — along  a 
continuum  of  unidimensional  behavior  from  strictly  locally  independent  unidimensional, 
di  =  1.  situations  to  dramatically  non-unidimensional.  d.£  >  1.  situations.  This  suggests 
the  following  model  fit/trait  estimation  taxonomy,  based  upon  the  index  Cj{8)  (contingent, 
of  course,  upon  the  qualitative  acceptance  of  (14)): 

I.  Cj(8)  2s  0  for  all  realistic  8' s.  In  this  situation,  ability  estimation  based  on  a 
di  =  1  model  could  proceed  as  usual,  using  familiar  standard  errors  such  as  Ij(8')~i ^ . 
This  situation  covers  both  di  —  1  settings  as  well  as  those  essentially  undimensional, 
d.£  =  1,  settings  that  only  mildly  violate  LI. 
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II.  Cj(0)  ^  0  but  moderate  in  size  for  all  realistic  0’s.  Here,  ©-estimation  procedures 


based  on  LI  could  still  be  used  but  the  conventional  standard  errors  would  have  to  be 
replaced  by  (fj(0)  +  Cj(0))1^3/lj(0).  This  would  be  the  usual  d£  =  1  setting. 

III.  Cj(0)  #  0  of  substantial  size  for  many  0’s.  This  would  suggest  that  there  is 
so  much  residual  variability  in  the  data  after  conditioning  on  ©,  that  some  genuinely 
multidimensional  latent  trait  model  may  be  needed. 

Of  course,  the  practical  use  of  such  a  taxonomy  rests  on  effective  estimation  of  Cj(0) 
itself.  Work  recently  completed  by  Nandakumar  and  Stout  (1989)  aims  at  developing  a 
practical  index  of  El  for  binary  items,  related  to  Cj(0)  but  not  adapted  to  the  task  of  trait 
estimation.  In  particular,  they  investigate  empirically  the  extent  to  which  dp  =  1  holds 
or  fails  in  the  paragraph  comprehension  setting,  as  the  number  of  items  per  paragraph 
increases. 

Another  approach  to  estimating  Cj(0 )  is  suggested  by  the  work  of  Gibbons,  Bock, 
and  Hedeker  (1989).  With  the  help  of  a  computational  device  called  the  modified  Clark 
algorithm,  they  are  able  to  factor- analyze  binary  items,  assumed  to  have  normal  ogive 
response  curves,  with  correlated  specific  factors.  Cj(0)  can  then  be  estimated  from  the 
common  factor  loadings  and  specific  factor  correlations,  at  least  when  their  one-factor 
solution  leads  to  the  same  latent  trait  as  identified  in  the  definition  of  ds  =  1- 
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6.  Application  to  the  Dichotomous  Case 


In  the  binary  (dichotomous)  case,  in  which  Xj  takes  the  value  0  or  1  depending  on 
the  examinee’s  answer  to  the  jih  item,  the  di  —  1  likelihood  is 

j 

P\Xj  =  XJ  I  ©  =  6}  =  n  Pj(8)Zi  [1  -  ,  (19) 

j- 1 

with  monotone  item  characteristic  curves  (ICC’s)  Pj{8)  =  P\Xj  =  1  |©  =  &].  Let  us 
assume  only  that  El  and  LAD  hold  with  respect  to  ©.  The  definitions  and  theorems 
presented  in  Section  3  all  specialize  to  the  dichotomous  setting,  and  in  fact,  most  were 
introduced  in  this  setting  by  Stout  (1990).  The  MLE  must  solve  the  likelihood  equation, 

j 

0  =  e'j(8j)  =  J2  -  Pj(h)l  (20) 

where  A  j(ff)  =  log  P}  (8) /(l  —  P,  {8))  (the  use  of  the  log-odds-ratios  A  j  is  equivalent  to  using 
the  log-category-probabilities  Aj0  and  A^j  from  Section  4,  and  avoids  summation  over  the 
n  =  2  response  categories).  As  before,  boundedness  of  A '  (0)  together  with  El  guarantees 
that  converges  to  zero,  given  0  =  5.  More  precisely,  we  will  assume  that,  for  all  8, 

there  exists  an  interval  Be  containing  6  and  a  constant  Me  <  oc,  such  that 

I  A"' (01  <  M,  Vt  €  Be,  V  j.  (21) 

To  complete  a  proof  of  consistency,  we  again  need  to  bound  the  test  information  function 
as 

,  J 

im  =  ~  5Z  A'(5)p;(5)  >  c#  >  0,  (22) 

;=i 
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as  J  — ►  oc,  and  as  in  Proposition  4.1,  LAD  is  a  sufficient  but  not  necessary  condition 
to  achieve  this.  Note  that  the  information  function  in  (22)  is  precisely  the  same  one 
introduced  in  (5)  for  n  =  2  response  categories. 

Theorem  6.1.  Let  J  be  a  dichotomous  item  sequence  satisfying  El,  (21),  and  (22).  Then 
there  exists  a  sequence  {Oj  '■  J  >  Jo)  of  roots  of  (20)  such  that 


lim  P[\6j  -6  |  <  e  |©  =  6}  =  1, 

J— oo 


for  every  e  >  0. 

Theorem  6.2.  Suppose  the  assumptions  of  Theorem  6.1  hold,  and  let  9j  be  any  sequence 
of  consistent  estimates  of  0 ,  given  ©  =  6.  Then,  the  Newton’s  method  improvement, 


e’j 


t'Ah) 


is  also  consistent  for  6. 

An  obvious  candidate  for  the  initial  guess  in  Theorem  6.2  is  9j  =  Pj'^Xj).  From 
(20)  and  the  above  results,  we  see  again  that  the  consistency  and  asymptotic  distribution 
of  9j  is  tied  up  with  the  behavior  of  the  centered  weighted  averages 


(*)  =  Aj  -  Aj(9) 

1  ^ 

=  (23) 

;  =  J 

with  Qj  =  A'(0),  where  again  the  dependence  of  a;-  on  9  does  not  matter  since  9  is  fixed. 

Once  again,  let 

Aj{9)  =  Var(Aj  |  9) 

1  J  2  J  i-:i 
=  Jt'L«2}PAm-Pj(0) }  +  —  EE  a,a;  Cov(A,  ,A;  i  0  =  9), 

>=i  i=  i  j= j 


34 


and  let  Cj(6)  =  (2/J)ZLi  Ej,1,  KW^j^)CoHXi,Xj\0). 

Theorem  6.S.  Suppose  that  the  assumptions  of  Theorem  6.1  hold  for  the  item  sequence  X 
and  the  latent  trait  0.  Also  suppose,  given  0  =  5,  that  in  (23), 

— “ [Aj  -Aj{6))  ~  AN(0,1).  (24) 

<7j{0) 

Finally,  suppose  R(J )  is  a  function  for  which  R2(J)Cj(0)/J  remains  bounded.  Then, 

Moreover,  if  9j  is  any  estimator  for  which  R(J)(Qj  —  6)  is  bounded  in  probability,  5}  from 
Theorem  6.2  is  also  asymptotically  normal  with  the  same  asymptotic  variance. 

Once  (24)  is  deemed  qualitatively  acceptable,  the  asymptotic  behavior  of  Oj  is  deter¬ 
mined  by  Cj(ff).  When  Cj(0)  is  near  zero,  we  can  expect  the  items  to  behave  as  though 
LI  were  true;  when  Cj{6)  is  much  larger,  we  should  expect  item  behavior  that  can  be 
effectively  analyzed  only  with  a  multidimensional  model. 

7.  Discussion 

In  assessing  the  shortcomings  of  the  traditional  local  independence  approach  to  item 
response  modeling,  Drasgow  and  Parsons  (1983,  p.  198)  conclude,  “it  seems  clear  that 
researchers  should  be  more  concerned  with  the  robustness  of  estimation  techniques  to 
minor  violations  of  dimensionality  assumptions  than  with  the  possibly  neverending  task 
of  measuring  all  latent  variables  that  underlie  responses  in  a  particular  content  domain.” 


This  call  for  the  study  of  structural  robustness  in  IRT  is  compelling:  Although  violations  of 


strictly  unidimensional  latent  structure  can  sometimes  be  explicitly  modeled  and  exploited, 
many  situations  call  for  a  unidimensional  approach  that  is  tolerant  of  minor  violations  of 
strict  unidimensionality. 

In  this  paper  we  have  extended  Stout’s  modeling  notion  of  essential  independence,  El, 
for  binary  items  (Stout,  1987;  1990)  to  polytomous  item  sequences.  Essential  independence 
permits  some  dependence  among  items  such  as  would  be  caused  by  minor  violations  of 
local  independence,  LI,  due  to  nuisance  trait  multidimensionality,  but  still  allows  a  single 
dominant  latent  trait  to  be  identified.  This  type  of  mild  interitem  local  dependence  is 
arguably  more  realistic  than  unidimensional  local  independence  models,  d^  =  1,  for  many 
currently-used  ability  and  achievement  tests. 

For  items  in  which  the  expected  amount  of  credit  awarded  increases  with  the  latent 
trait,  we  have  developed  a  theory  of  ability  estimation  under  El  that  closely  parallels  Stout 
(1990).  As  in  Stout’s  dichotomous  response  theory,  monotonicity  need  not  be  assumed 
for  the  individual  items,  but  rather  only  for  the  test  characteristic  curve.  Under  this 
aggregate  monotonicity  condition,  called  local  asymptotic  discrimination,  LAD,  we  have 
shown  that  the  transformation  Aj1(Aj)  of  the  raw  test  score  is  a  consistent  estimator  of 
each  examinee’s  6  as  the  test  length  J  grows.  A  definition  of  essential  unidimensionality, 
ds  =  1,  was  proposed  based  on  El  and  LAD  holding  with  respect  to  a  unidimensional  trait 
0. 

An  alternative  to  scoring  the  items  using  an  ad  hoc  scoring  scheme  {a^m}  (which  leads 
to  the  test  scores  A  j  above)  is  to  ignore  the  local  dependence  among  the  items  and  employ 
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a  well-known  Li-based  estimation  procedure.  Since  it  is  common  to  use  an  LI  model  even 
when  LI  is  believed  to  be  only  approximately  true,  the  behavior  of  such  a  procedure  in  the 
more  realistic  El  setting  is  an  important  issue,  sis  Drasgow  and  Parsons  attest  to  above. 
Maximum  likelihood  estimation  of  6  was  examined  in  this  light. 

The  MLE  6 j  based  on  a  unidimensional  LI  model  was  shown  to  be  consistent  for  each 
examinee’s  6  as  the  test  length  J  grows,  when  only  El  and  not  LI  holds.  In  this  sense 
Oj  is  robust  as  an  estimator  of  6  under  this  realistic  structural  violation  of  LI.  When  an 
estimator  such  as  8j  is  found  to  be  consistent,  its  precision  as  an  estimator  is  usually 
judged  by  the  theoretical  asymptotic  distribution  of  —6).  Under  LI,  we  expect  this 

to  be  asymptotically  normal  with  mean  zero  and  variance  1  /Ij{8)  as  J  — ►  oo,  where  Ij{6) 
is  the  test  information  function.  When  8j  is  based  on  an  LI  model  but  only  El  holds, 
this  asymptotic  distribution  may  fail  in  various  ways:  the  rate  J'12  may  be  preserved 
but  the  variance  may  be  inflated  by  an  essentially  constant  amount;  the  rate  J1/2  may 
fail;  and  finally,  it  is  conceivable  that  asymptotic  normality  itself  fails,  with  any  rate  of 
convergence.  Hence,  the  robustness  of  consistency  for  the  MLE  does  not  extend  to  a 
robustness  of  asymptotic  distribution,  under  El  violations  of  LI. 

Conditions  for  asymptotic  normality  of  6j  involve  higher  product-moment  assump¬ 
tions  that  do  not  admit  easy  rigorous  checks  from  the  data.  Hence,  asymptotic  normality 
itself  is  usually  a  qualitative  issue  that  must  be  decided  by  the  practitioner  in  each  ap¬ 
plication.  If  asymptotic  normality  is  qualitatively  acceptable,  the  correct  variance  can  be 
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calculated  with  the  help  of  the  expression 


2  J  — 1 

Cj(8)  =  -  Y*  S  Cov(Ai,Aj  \  9), 

.=  i  >=i 

where  the  .4; ’s  score  each  response  category  according  to  the  derivative  of  the  log-category- 
probability:  Aj  =  l0  In  principle,  Cj{8)  could  be  positive  or  negative; 

however,  in  many  educational  testing  settings,  we  expect  it  to  be  positive.  Under  El, 
Cj(8)/J  — *  0,  but  Cj{6)  itself  need  not  tend  to  zero. 

For  fixed  J,  the  quantity  Cj  ( 8 )  should  be  viewed  as  am  index  of  local  item  dependence 
along  a  continuum  that  connects  strictly  di  =  1  unidimensional  models  with  strictly 
d.£  >  1  multidimensional  models.  Such  a  continuum  has  also  been  suggested  by  Drasgow 
and  Parsons  (1983).  The  —  1  unidimensional  models,  which  are  the  focus  of  this  paper, 
form  the  middle  of  this  continuum.  The  nearer  Cj(0)  to  zero,  the  more  we  can  expect 
latent  trait  estimation  to  behave  as  though  LI  were  true.  The  larger  Cj(8),  the  more 
we  should  expect  item  behavior  that  can  be  effectively  analyzed  only  with  an  explicitly 
multidimensional  latent  trait  model.  Thus,  if  Cj(8)  could  be  effectively  estimated  in 
practice,  we  would  be  able  to  use  it  to  predict  the  behavior  of  0j.  Various  ideas  for  doing 
this  are  provided  by  Wainer  and  Wright  (1980),  Gibbons,  Bock,  and  Hedeker  (1989),  and 
Nandakumar  and  Stout  (1989).  For  this  reason,  the  non-robustness  of  distribution  of  8j 
need  not  be  defeating. 

The  principal  assumptions  needed  to  establish  consistency  of  the  Li-based  MLE  were 
El  and  that  the  information  function  Ij{8)  (calculated  as  though  LI  were  true)  be  bounded 
away  from  0  and  oo.  Indeed,  a  hierarchy  of  identifiability  conditions  for  estimating  8  can  be 
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developed,  starting  with  cumulative  RCC  monotonicity  M  (i.e.,  ordered-response  items), 
moving  through  test  characteristic  curve  monotonicity  LAD,  to  the  bounding  of  Ij(8)  away 
from  0.  Each  of  these  conditions  in  some  sense  implies  the  next,  and  all  allow  various  forms 
of  unidimensional  latent  trait  estimation.  This  hierarchy  illustrates  the  transition  from 
highly  interpretable  but  very  restrictive  conditions,  such  as  M,  to  less  restrictive  conditions 
that  do  not  admit  easy  psychometric  interpretation,  such  as  the  bounding  conditions  on 
./;(«)■ 

Essential  independence  plays  a  central  role  in  the  convergence  of  0j  to  6  because  it 
guarantees  the  stability  of  certain  weighted  averages  of  item  scores  that  appear  in  the 
Li-based  log-likelihood.  Therefore,  we  might  expect  that  under  El  and  suitable  regularity 
conditions,  other  estimators  that  depend  on  the  stability  of  the  Li-based  log-likelihood 
would  also  be  consistent  estimators  of  6.  Indeed,  a  trivial  modification  of  the  proof  of 
Theorem  4.1  shows  that  the  posterior  mode,  which  maximizes  the  posterior  density 


fj(0\Xj)  = 


p\Xj  =  xj  \e}/(0) 

P\Xj  =  I;] 


is  consistent  for  0  under  the  conditions  of  that  theorem  and  a  mild  nondegeneracy  con¬ 
dition  on  the  density  f{6)  of  0  in  the  examinee  population.  The  posterior  mode  has 
been  considered  by  Samejima  (1969)  and  by  Lord  (1986),  for  example.  A  different  set  of 
regularity  conditions  from  those  employed  in  Theorem  4.1,  which  are  equally  plausible  in 
applications,  can  be  used  to  obtain  consistency  of  the  posterior  mean, 


£1©  I  Xj\  —  J 


| Xj)d0. 
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Essential  independence  is  used  here  to  ignore  the  part  of  the  integral  away  from  the  value 
0O  that  generated  the  data  Xj\  see,  for  example,  the  proof  of  equation  (5)  in  Walker  (1969). 
The  regularity  conditions  needed  generalize  Walker’s  conditions,  and  incidentally  provide 
another  proof  of  consistency  of  the  MLE.  The  posterior  mean  has  been  considered  by,  for 
example,  Bock  and  Mislevy  (1982)  as  well  as  earlier  by  Samejima  (1969). 

Essential  independence  is  thus  seen  to  be  a  minimal  condition  under  which  strictly 
d.L  =  1  trait  estimation  procedures  may  be  expected  to  work  when  applied  to  mildly  mul¬ 
tidimensional  data.  Our  examination  of  essential  independence  in  the  polytomous  item 
response  setting  shows  that  this  condition  is  not  an  artifact  of  the  simple  structure  of 
dichotomously-scored  tests,  but  a  general  condition  that  can  be  fruitfully  applied  to  stan¬ 
dardized  tests  of  all  sorts.  Moreover,  we  have  shown  that  a  rigorous  approach  to  the  struc¬ 
tural  robustness  analysis  advocated  by  Drasgow  and  Parsons  (1983)  is  possible.  Locally 
independent  latent  trait  models  can,  and  should,  continue  to  be  used  to  develop  estimation 
and  decision  procedures  in  IRT,  if  for  no  other  reason  than  their  analytic  simplicity.  How¬ 
ever,  before  Li-based  procedures  are  applied  on-line,  they  should  be  thoroughly  examined 
under  the  more  realistic  assumption  of  essential  independence. 
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